Reflected generalized concentration addition and Bayesian hierarchical models to improve chemical mixture prediction

Environmental toxicants overwhelmingly occur together as mixtures. The variety of possible chemical interactions makes it difficult to predict the danger of the mixture. In this work, we propose the novel Reflected Generalized Concentration Addition (RGCA), a piece-wise, geometric technique for sigmoidal dose-responsed inverse functions that extends the use of generalized concentration addition (GCA) for 3+ parameter models. Since experimental tests of all relevant mixtures is costly and intractable, we rely only on the individual chemical dose responses. Additionally, RGCA enhances the classical two-step model for the cumulative effects of mixtures, which assumes a combination of GCA and independent action (IA). We explore how various clustering methods can dramatically improve predictions. We compare our technique to the IA, CA, and GCA models and show in a simulation study that the two-step approach performs well under a variety of true models. We then apply our method to a challenging data set of individual chemical and mixture responses where the target is an androgen receptor (Tox21 AR-luc). Our results show significantly improved predictions for larger mixtures. Our work complements ongoing efforts to predict environmental exposure to various chemicals and offers a starting point for combining different exposure predictions to quantify a total risk to health.


RGCA Details 1.RGCA formula for Negative sill
The piecewise inverse RGCA function for the case of a negative sill is provided here:

Isobolograms
We illustrate the effect of different relative values of a sill α and slope β in a simulated binary mixture in figure 1.The top row shows different sill values while the right column shows different slope values.The reference chemical has a sill α = 5, EC 50 of θ = 1 and slope β = 1, 1 while the EC 50 for our alternate chemical is θ = 0.7.When the sills and slopes are equal we recover CA and when just the slopes are equal we have GCA.The isoboles are extended in an intuitive way when the slopes vary beyond β = 1.where the other chemical has a sill of 5.The right side shows the slope parameter, while the other chemical has a slope of 1.The center plot in the rightmost column is CA, the center row is GCA, and the lower and upper rows show how our reflection method extends GCA for non-unit slopes when using the Hill function.
The anomalies in the first column are due to the negative sill, which leads to multiple solutions (slope<1) or no solutions (slope>1) in GCA.

Proof Sketch, existence of GCA solution with positive sills
The GCA Eq 2 consists of concentrations c i with coefficients 1/f −1 i (R).When plugging in positive values for R, the coefficients have range [−1/(2θ i ), ∞), where R → ∞ results in −1/(2θ i ) and R → 0 leads to a coefficient of ∞.This can be seen by first checking that the inverse function This means that, for very large R, all of the concentrations get a negative coefficient, while decreasing R will continuously and monotonically increase all coefficients to positive infinity.This means the left side of Eq 2 will continuously vary from a negative value to positive infinity.Hence there exists (by continuity) a unique (by monotonicity) value for R that will yield a sum of 1.
When negative sills are present, the uniqueness is no longer guaranteed.As illustrated in Fig 2, depending on the magnitude of the negative sill and slope, the monotonicity of the combined coefficients can be disrupted in the form of a dip or bump that results in up to two additional possible solutions that solve Eq 2.

Bayesian Model Specification
For our Bayesian hierarchical model, we use weakly informative priors to provide some penalization of extreme parameter values while avoiding the issues of noninformative uniform priors, which can either lead to maximum likelihood fits with extreme values or can be sensitive to boundaries [1].

Hierarchical Model, Random Effects
For identifiability, we fix u 1j = v 1j = 0 for the first replicate of each chemical.Weakly informative conjugate inverse gamma priors are chosen for σ u , σ v , σ, with hyperparameters chosen to yield a high variance: IG(0.1, 0.1).
A Gibbs sampling procedure can be used for the effects a i , u ij , v ij and variance terms σ u , σ v , σ because they have conjugate priors and closed-form posteriors.The remaining terms θ and β occur as nonlinear elements and are sampled via Metropolis-Hastings.The slope term β has proposals generated from truncated normal distributions centered at the previous value.The possible values for the EC 50 parameter covers multiple orders of magnitude, so the proposals are generated following a lognormal distribution: a small Gaussian step is made from the log of the current estimate, and then that proposal is exponentiated to provide a new candidate parameter.

MCMC convergence
We perform a Gelman-Rubin tests of convergence and mixing to determine if our model fitting procedure is actually becoming stationary or mixing poorly.The test statistic assumes a Gaussian posterior, which requires a log-transform for some chemicals.The results are in table 1 and suggest that most of the chains converge reasonably well.Some chemicals like chemical 17, benzyl butyl phthalate, mix poorly as a result of identifiability issues described in Figure 2 and in Section 3.2.MCMC trace plots for the sill, E50, and slope parameters show samples drawn from the posterior after a burn-in of 5000 iterations and thinning the remaining 20,000 iterations down to 5000 points.The EC 50 for chemical 17 (benzyl butyl phthalate) was the only trace that failed to converge to a stationary distribution according to the Gelman-Rubin diagonostic, but other parameters such as the EC 50 for chemical 1 do not have ideal trace plots.Failures to reach a stationary state for some chemicals and parameters suggests an identifiability issue where there is no actual response in the data.

Mixture Predictions
In this section we show a few representative examples of dose response prediction.For the full mixtures, our method often shows good coverage of the observed data, but sometimes IA or GCA would better predict the mean effect.Some smaller mixtures were not predicted correctly by any method due to synergy or antagonism.However, our method has the capability to consider multiple clusterings and in some cases generates credible intervals that are wide enough to cover the observed data.

Figure 1 :
Figure1: Isobole plots for various parameters.Isobole plots showing lines of equal response for a mixture of two chemicals with various parameter settings.The top row shows the sill for chemical 1 (c1), where the other chemical has a sill of 5.The right side shows the slope parameter, while the other chemical has a slope of 1.The center plot in the rightmost column is CA, the center row is GCA, and the lower and upper rows show how our reflection method extends GCA for non-unit slopes when using the Hill function.The anomalies in the first column are due to the negative sill, which leads to multiple solutions (slope<1) or no solutions (slope>1) in GCA.

IndexFigure 2 :
Figure2: MCMC stationarity.MCMC trace plots for the sill, E50, and slope parameters show samples drawn from the posterior after a burn-in of 5000 iterations and thinning the remaining 20,000 iterations down to 5000 points.The EC 50 for chemical 17 (benzyl butyl phthalate) was the only trace that failed to converge to a stationary distribution according to the Gelman-Rubin diagonostic, but other parameters such as the EC 50 for chemical 1 do not have ideal trace plots.Failures to reach a stationary state for some chemicals and parameters suggests an identifiability issue where there is no actual response in the data.

Figure 4 :
Figure 4: Examples of mixture dose response curves plotted with replicates.13

Table 1 :
Tests for MCMC stationarity.Gelman-Rubin Diagnostic statistics for the MCMC trace plots of the model parameters of interest.Values near 1 suggest convergence to a stationary distribution, while values above 2 suggest a failure to reach stationarity.Values computed with three chains of 25,000 iterations each.Other parameters showed diagnostic values near 1 and are not included to be concise

Table 2 :
Full posterior summary statistics.Parameter estimates from the MCMC RE model, computed as the posterior median.The values are rounded to 2 digits or 3 significant digits depending on scale.RE SD refers to the Sill random effect standard deviation and Int SD refers to the intercept random effect standard deviation (the intercept is assumed to be 0).Poster variance estimates are available for all parameters on request.

Table 3 :
MSE scores for Tox21.Mean squared error values for the Tox21 AR-luc mixture predictions using the methods described in the main text.RGCA uses no clustering, RGCA Sill uses the postive sills as one cluster and all other sills as another cluster, and RGCA KM refers to K-means clustering, RGCA Rand refers to random clustering.

Table 4 :
CRPS scores for Tox21.Continuous rank probability scores (CRPS) for the Tox21 AR-luc mixture predictions using the methods described in the main text.RGCA uses no clustering, RGCA Sill uses the postive sills as one cluster and all other sills as another cluster, and RGCA KM refers to K-means clustering, RGCA Rand refers to random clustering.

Table 5 :
LLH scores for Tox21.Log likelihood scores for the Tox21 AR-luc mixture predictions using the methods described in the main text.RGCA uses no clustering, RGCA Sill uses the postive sills as one cluster and all other sills as another cluster, RGCA KM refers to K-means clustering, and RGCA Rand refers to random clustering.GCA, IA, and CA are predicted without uncertainty and have infinite LLH.

Table 6 :
Tox21 Mixture Descriptions Includes placeholder CAS numbers used in plots for reference.